<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2" />
<title>6.863 Project: Policy Parser</title>
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>

	<!-- Begin Wrapper -->
	
		<div id="wrapper">
		
		<!-- Begin Header -->
			
			<div id="header">
				<h1>Policy Parser</h1>
				<h2>6.863 Spring 2008 Project</h2>		 
			</div>
			
		<!-- End Header -->
			
		<!-- Begin Strap -->
		
        <div id="strap">
<b>Policy Parser</b> &raquo; Implementation</div>
		
		<!-- End Strap -->
		 
		<!-- Begin Navigation -->
		
		<div id="navigation">
			<ul id="menu">
				<li><a href="index.html">Home</a></li>
				<li><a href="intro.html">Intro</a></li>
			    <li><a href="try.html">Try It Now!</a></li>
	            <li><a href="#">Implementation</a></li>
                <li><a href="results.html">Results</a></li>
	            <li><a href="future.html">Future Directions</a></li>
	            <li><a href="download.html">Download Code</a></li>
 			 	<li><a href="members.html">Members</a></li>
			</ul>
		</div>
		
		<!-- End Navigation -->
		 
		<!-- Begin Content-->

		<div id="content">
		    <h2> Implementation of the Policy Parser</h2>
			<br/>
			<div id="panel">
			<b>
				<a href="#intro">Introduction</a><br/>
				<a href="#assumptions">Assumptions on User Input</a><br/>
				<a href="#nltk">NTLK Parser</a><br/>
				<a href="#interpreter">Policy Interpreter</a><br/>
				<a href="#rdf">RDF Generator</a><br/>
				<a href="#ref">References</a><br/>
				<br/>
			</b>
			</div>
			<hr/>
			<br/>
			
			<h2><a name="intro">Introduction</a></h2><br/><br/>
			
			<p> As briefly described in
			the <a href="intro.html">Intro</a> section, the Policy
              Parser consists of three major components: the 
			  Parser, which uses a feature-based context free grammar to extract semantic information of interest, the Policy Interpreter, which analyzes and processes the semantic information, and the RDF
			  Generator, which generates the corresponding AIR policy in the <a href="http://www.w3.org/DesignIssues/Notation3">RDF/N3 format</a>.  The system also
			  includes the online RDF policy store as well as the AIR
			  reasoner, but these can be considered peripheral to the
			  above three components.  In this section, we will describe each
			  of these components in detail. Before we dive
			  into the details, we first define the scope
			  of our project with respect to the types of user input that
			  the Policy Parser is designed to handle. 
            </p><br>
            <p><b><h2><a name="assumptions">Assumptions on User Input:</a></h2></b></p><br>
            <p>Automatically parsing an arbitrary natural
              language sentence is an extremely challenging task, and
              is beyond the scope of this project. Thus, we focus on a constrained set
              of natural language sentence forms. Past research has shown that constrained natural language can be effectively used for automatically generating policies, and that unconstrained natural language 
              gives significantly poorer results <a href="#1">[1]</a>, <a href="#2">[2]</a>.  More
              specifically, our system assumes a specific structure for a natural language policy which is as follows:
            </p><br>
            <p>
              <tt>subject mod action [object] [secondary_object] 
                [for/to purpose] [if condition (and condition)*]</tt>
            </p><br> 
            <p>
              where the meanings of the constituents are the following:
              </p><ul id="panel">
                <li> <tt>subject</tt>: The main actor or entity who
                carries out an action 
                </li><li> <tt>mod</tt>: The modality of the rule (i.e., whether the subject is allowed or disallowed to perform the specified action under the specified constraints, expressed using words such as "can", "cannot" etc.)
                </li><li> <tt>action</tt>: The action performed by the
                subject 
                </li><li> <tt>object</tt>: The object on which the action
                is performed (when "action" is a transitive verb)
                </li><li> <tt>secondary object</tt>: the secondary object for the action (applicable only if "action" is a ditransitive verb)
                </li><li> <tt>purpose</tt>: an (optional) purpose for which the
                action is performed
                </li><li> <tt>condition</tt>: an (optional) condition (or set of conditions) under which the rule is applicable
              </li></ul>
            <br><p> 
For example, consider the sentence "Police may search people's homes for a criminal investigation if the police has a warrant". Here, we have the following key value pairs:
	   </p> <br>
            <ul id="panel">
<li> subject: police
</li><li> mod: yes
</li><li> action: search
</li><li> object: people's homes
</li><li> secondary_object: <i>none</i>
</li><li> purpose: criminal investigation
</li><li> condition: police has a warrant
</li></ul>
<br>

            <p>
              Each sentence corresponds to a <i>single</i> policy. 
              The constituents between '[' and ']' are optional; therefore, an
              input sentence must consist of, at minimum, 
              a subject, an action
              and the modality of the rule. Also, '*' means "one or more";
              thus, a policy may consist of any number of conditions.
</p><br>
<p><b><h2><a name="nltk">NTLK Parser:</a></h2></b></p><br>
<p>
Given a user's input policy, the first step is to parse this sentence into a structured format that can be manipulated further by a computer. As stated earlier, 
we do this using FCFGs. 
              Each of the constituents of a policy statement (subject, action etc.) are further composed of simpler constituents such as noun phrases (NPs), 
              verb phrases (VPs) etc. The permissible forms for each constituent as well as its sub-parts, and the relation of a node's semantic value with that 
              of its children is specified by the FCFG. 

The start node of the FCFG (S) has a feature corresponding to each of the constituents (subject, action etc.) mentioned above.
Since we are interested in the semantic interpretation of the natural language policy, we use the <i>sem</i> feature to record semantic values and propagate 
them up the parse tree by combining the semantic values of a node's children <a href="#7">[7]</a>. The values of features like <i>subject, action, object etc. </i> 
are derived from the values of the <i>sem</i> features of various constituents. Since we assume a rather constrained form of input, we are able to develop a 
simple grammar that parses sentences of the form described above and is able to infer the semantic role of a word in the sentences from its position in the 
parse tree relative to other constituents. In particular, we don't have to worry about many complications that arise in unconstrained natural language with 
quantifiers. 
</p><br>

<p>
The words recognized by the system and their categories (N,V etc.) are included in a lexicon file. Spelling changes for words in the lexicon are handled by a 
kimmo rule (yaml) file. We have thus modularized our implementation so that the feature-based CFG is decoupled from the lexicon and the spelling rule changes. 
Also, the lexicon and the spelling change rules allow us to do some clever things, for example, the two sentence snippets "...for criminal investigation..."
 and "...to investigate crime..." lead to the same value for the feature <i>purpose</i> (purpose = (investigate crime)). This gives the user of the system 
 some freedom in expressing a policy. This is particularly important in light of the way the AIR policy associates sentence fragments with predefined ontologies 
 (as explained in the section on the <a href="#rdf"><b>RDF Generator</b></a>). This and other such techniques may be used to make the system more flexible. 
 
</p><br>
              For example, the sentence above, "Police may search people's homes for criminal investigation if the police have a warrant" is parsed
              into the following syntax tree: 
            </p><br>
            <p>
 		<br/>

		<div style="text-align: center;">
        	<a href="images/example_parse_tree.png" target="_blank"><img src="images/example_parse_tree.png" width="100%" alt="Example Parse Tree"/></a>
            <p>Figure 1: Example Parse Tree (click on the image to enlarge; image will be displayed in another window)</p><br/>
        </div>
           </p><br>
            <p>
              As shown in this tree, the left-hand side of 
              the top-level production rule 'S' contains a
              feature for each of the semantic constituents (for example, 
              the feature "action" for the action constituent in the
              policy) for the natural language rule. The value assigned to each top-level feature 
              is defined in terms of features from the subtrees. Note
              that the l.h.s. of each part-of-speech rule 
              (e.g. 'NP' for noun-phrase) is augmented with the 'sem'
              feature; this allows us to assign a logical expression
              to each node in the tree. 
Here, we use the application
              of the lambda function 'cat' to concatenate one or more lexicons
              into a single constituent.  
            </p><br>
            <p>

By explicitly defining a top-level feature corresponding to each constituent of interest, the implementation of the Policy Interpreter 
(the next component in the pipeline) is greatly simplified - it just
              needs to walk over
              the list of top-level features and map each to a
              corresponding RDF construct. 
            </p><br>
            <p>  
              We make us of the existing infrastructure in the Natural
              Language Toolkit (NLTK) <a href="#3">[3]</a> for this part of the Policy
              Parser.  More specifically, we use the featureparser class <a href="#4">[4]</a>, which in turn is built on top of the Natural Language 
              Toolkit for Python.
              In addition to a user's sentence, 
              the FCFG parser in NLTK requires three arguments; the
              FCFG grammar (stored as a .fcfg file), the spelling
              change rules (a Kimmo file), and the
              lexicon (.lex file). All of these files are available in
              the <a href="download.html">Download Code</a> section. 
            </p><br>
           
            <p><b><h2><a name="interpreter">Policy Interpreter:</a></h2></b></p><br>
            <p> Given a syntax tree generated by the NLTK Parser, the
              role of the Policy Interpreter is to extract the
              semantic information about the user's policy - i.e. the
              value for each of the constituents in the policy.  As
              mentioned above, this process is relatively
              straightforward, as it simply involves looking through 
              the list of features that are stored in the root of the syntax
              tree. The only intricate step in this process is walking
              over a lambda-calculus expression that is a
              concatenation of multiple lexical tokens (e.g. '(cat
              prox (cat card data))' for "prox card data"); this step is
              implemented using a recursive function for traversing an
              expression. 
            </p><br>
            <p>
              The output from the Policy Interpreter is a dictionary
              (i.e. a list of key-value pairs) that maps each
              constituent to its value.  For example, given the syntax
              tree for the sentence "Committee on Discipline may access
              prox card data for criminal investigation", the Policy
              Interpreter outputs the following dictionary: 
             </p><br>
			  <table border="0">
			    <tbody><tr>
                  <th align="left">Key</th>
                  <th align="left">Value</th>
                  <th align="left"></th>             
                </tr>
                <tr>
                  <td><tt>'POLICY'</tt></td>
                  <td><tt>'MIT Prox Card Policy'</tt></td>
                  <td>/* name of the policy */</td>
                </tr>
                  <tr><td><tt>'ENTITY'</tt></td>
                  <td><tt>'Committee on Discipline'&nbsp;&nbsp;</tt></td>
                  <td>/* actor */</td>
                </tr><tr>
                  <td><tt>'FLAG'</tt></td>
                  <td>True</td>
                  <td>/* modality (True for positive, such as 'can') */</td>
                </tr>
                <tr>
                  <td><tt>'ACTION'</tt></td>
                  <td><tt>'use'</tt></td>
                  <td></td>
                </tr>
                <tr>
                  <td><tt>'PURPOSE'</tt></td>
                  <td><tt>'investigate crime'</tt></td>
                  <td></td>
                </tr>
                <tr>
                  <td><tt>'DATA'</tt></td>
                  <td><tt>'prox card data'</tt></td>
                  <td>/* object */</td>
                </tr>
                <tr>
                  <td><tt>'CONDITION'</tt></td>
                  <td>[ ]</td>
                  <td></td>
                </tr>
                <tr>
                  <td><tt>'SECONDARY ENTITY'&nbsp;&nbsp;</tt></td>
                  <td>None</td>
                  <td></td>
                </tr>
              </tbody></table>
              <br> 
              Note that some of the keys in the dictionary have
              different names than the constituents, but there is a 1-1 
              correspondence between these two sets of names.  The
              renaming was done due to the fact that RDF uses a set of
              pre-designated identifiers for its constructs. 
            <br>   
            <p>
              The Policy Interpreter has been implemented in Python and
              can be found in the PolicyInterpreter directory of the
              <a href="download.html">downloadble code</a>. 
            </p><br>
              <!-- RDF Generator -->
            
            <p><b><h2><a name="rdf">RDF Generator:</a></h2></b></p>
           <br/>
            
            <p>This module is responsible for generating
              <a href="http://dig.csail.mit.edu/TAMI/2007/AIR/">AIR
                (Accountability In RDF)</a> [4] policies. 
              Once the user's sentence is
              fragmented into the relevant constituents as described
              above, it will identify the corresponding RDF 
              segments and create the RDF graph using the
             <a href="http://rdflib.net/">RDFLib</a> [3] Python library. 
            </p>
            <br/>
            
            <p><a href="http://dig.csail.mit.edu/TAMI/2007/AIR/">AIR</a> 
              is a rule-based policy language written in 
              <a href="http://www.w3.org/RDF/">RDF</a> 
              for accountability and access control. Each AIR policy 
              consists of one or more rules:</p>
            <br /><pre>policy = { rule* } </pre><br />
	        <p>Each rule is made up of a pattern that when matched 
              causes an <code>action</code> to be fired. Optionally 
              there could be <code>description</code> and 
              <code>justification</code> 
            elements as well.</p>
            <br /><pre>rule = { pattern, action [ description justification ] } </pre><br />   
            <p> An <code>action</code> can either be an <code>assertion</code> (which is a set of facts  that are added to the knowledge base) or a nested <code>rule</code>.
            </p>
            <br /> <pre>action = { assertion | rule } </pre> <br />
            <p>The basic skeleton of an AIR policy is as follows:</p>
            <br/>
            <pre>
:MyFirstPolicy a air:Policy;
air:rule [
	air:variable { ... };
	air:pattern { ... };
	air:assertion { ... };
	air:rule [ ... ]
	].
	 		</pre>
            
            <p>In order to generate an AIR policy from a constrained
            natural language sentence, two additional parameters,
            besides the policy itself, are required:</p><br />
            <p>
            <ol id="panel">
            <li>Name for the policy</li>
            <li>One or more domain ontologies</li>
            </ol></p><br />
            
            <p>The name of the policy is required because the user
              should be able to identify the policy, and it is not 
              suitable to have a random machine-generated 
              policy name. The domain ontology is required because 
              some of the sentence fragments need to be identified 
              with the corresponding term (which could be 
              either a subject or a predicate or an object) in RDF.
            </p><br/>
            
            <p>The following diagram shows how the dictionary keys and values from the Policy Interpreter is used in constructing the AIR policy.</p>
            <br/>
            <div style="text-align: center;">
            <img src="images/rdf_gen.png" width=80% height=80% alt="RDF Generation"/>
             <p>Figure 2: Mapping from the sentence fragments to the AIR Policy</p><br/>
            </div>

            <p>The dictionary element 'POLICY' is used to name 
              the policy. If the keys of dictionary elements:
              'ENTITY', 
              'ACTION', 'PURPOSE', 'DATA' and 'PASSIVE ENTITY'
              have non-null values, they are used to assign 
              <code>air:variable</code>s. The values of these
              dictionary
              elements are then mapped on to <code>air:pattern</code>s
              where each <code>air:variable</code>s in the previous 
              step are assigned the corresponding <code>rdf:type</code>s. 
            </p>
            <br/>
            <p>The sentence could include conditions 
             as in the following sentence: "Committee on
             Discipline can use prox card data if the Committee 
                on Discipline has authorization". 
            The conditional phrase beginning with the word "if" may refer to components which are already identified, such as "Committee on Discipline" 
            (identified as the 'ENTITY'). Therefore another 
            pattern will be constructed with the subject, predicate or object values encoded in the 'CONDITION'. </p><br/>
            
                <p>All of these steps requires proper identification
              of the RDF terms as defined in the ontology. <a href="http://www.w3.org/TR/rdf-sparql-query/">SPARQL</a> queries are used to identify the correct 
              term from the RDF ontolgy. There is an underlying assumption that for each <code>rdfs:Class</code> and <code>rdfs:Property</code>, which corresponds to 
              subject/object and predicate respectively,  the ontology author has 
              defined a proper <code>rdfs:label</code> attribute. This attribute is very much likely to be used by the policy author, hence the SPARQL query
              is to search for the subject which matches the dictionary value. </p><br/> 
            <p>Finally, the 'FLAG' sent by the Policy 
              Interpreter will determine to specify whether 'ACTION'
              is compliant or not with the policy.</p><br/>
                        
            <!-- End of RDF Generator -->
			
			<br />
			<br />
			<h3><a name="ref">References:</a></h3>
			<br />
			            
            <p>
	     <a name="1">[1]</a> Carolyne A. Brodie, Clare-Marie N. Karat, John Karat and JinJuan Feng. Usable Security and Privacy: 
	     A Case Study of Developing Privacy Management Tools. <i> SOUPS 2005 Symposium on Usable Privacy and Security </i>. ACM, April 2005 <br>
	     <a name="2">[2]</a> Clare-Marie Karat. SPARCLE (Server Privacy Architecture and Capability Enablement) policy workbench. <a href="http://domino.watson.ibm.com/comm/research.nsf/pages/r.security.innovation2.html"> 
http://domino.watson.ibm.com/comm/research.nsf/pages/r.security.innovation2.html</a>
<br>
             <a name="3">[3]</a> Steve Bird and Edward Loper. NLTK: The Natural
             Language Toolkit. <i>Proceedings of the ACL demonstration
             session</i>. pp 214-217, Barcelona, Association for
             Computational Lingusitics, July 2004. <br>
             <a name="4">[4]</a> Rob Speer. The 6.863 Feature Parsing Library. 
             <a href="http://web.mit.edu/6.863/www/parser">
               http://web.mit.edu/6.863/www/parser</a>

               <!-- References for the RDF Generator -->
			<br>               
             <a name="5">[5]</a> RDFLib <a href="http://rdflib.net/">http://rdflib.net/</a><br>
             <a name="6">[6]</a> Lalana Kagal, Chris Hanson, Daniel Weitzner at al, Decentralized Information Group, CSAIL, MIT. 
             AIR Policy Language <a href="http://dig.csail.mit.edu/TAMI/2007/AIR/">http://dig.csail.mit.edu/TAMI/2007/AIR/
</a><br>
	     <a name="7">[7]</a> Natural Language Toolkit Book <a href="http://nltk.org/index.php/Book"> http://nltk.org/index.php/Book  </a><br>
            <!-- End of References for the RDF Generator -->
               
<br/>
            </p>

	    </div>
		
		<!-- End Content -->
		
		<!-- Start Footer -->
		
		<div id="footer">
		</div>
		
		<!-- End Footer -->
		 
    </div>
	
    <!-- End Wrapper -->
	
</body>
</html>
